scivisfandomcom-20200213-history
SciVis SBIR Phase II
toc =Notes on the SciVis Phase II planning.= This page details the planning for year one of the SciVis Pahse II SBIR. **This is a DRAFT and is likely to evolve.** =Software Proccess= Documentation A combination of Doxygen and Wiki pages will be used to document the developed classes. All submitted classes will have Doxygen style comments in its header files. Doxygen html pages should be generated on a regular basis and posted on a web site hosted by SciberQuest. Big picture documentation will be provided on a SciberQuest hosted Wiki. Versioning We will make use of subversion server hosted by SciberQuest for source code version control. Build We will use CMake so that our project can be seamlessly and automatically configured, built, and tested on all platforms of interest. Testing We will use the CTest portion of the CMake build tool for software quality control, validation and testing. We will configure a nightly test run on all platforms of interest. We will establish a nightly dashboard for reporting of test results, through which we can monitor performance and correctness over time, track code testing coverage, and identify memory leaks. Each contributed class will be submitted to the subversion repository with a test. Summary Note: The cost of developing specific CMake configuration files, validation and quality control tests, Doxygen and Wiki documentation is included in the estimate of the costs of developing the specific source code components. =Phase II Proposal Assessment= VTK OOC Integration The proposal specs out a paging scheme that relies on intercepting memory read/write/maloc/realloc/free operations on subclasses of vtkDataArray. The main point of contact would be in vtkDataArrayTemplate. We would insert the OOC paging logic into vtkDataArrayTemplate:: Get/SetValue, InsertNextValue, Get/SetTuple, and InsertNextTuple methods. We would insert the OOC memory management API calls in vtkDataArrayTemplate:: Allocate, ResizeAndExtend, and DeepCopy. We need to configure the paging/swap mechanism in the vtkDataArrayTemplate:: constructor (or subclasses) so that VTK algorithms which create data arrays on the heap during the course of their normal functionality don't end up with all of the data in memory. Containers for geometry and topology information also will need modification similar to those already described for vtkDataArray and subclasses. These containers include vtkPoints, vtkCellArray, vtkCellType, vtkCellLinks. Each VTK object and algorithm which we plan to make use of, directly or indirectly, will need to be examined and potentially modified. This is because of the alternate Get/Set Pointer/Data API which allows developers to manipulate pointers to underlying data directly bypassing the Get/Set Value API. It is common practice for efficiency to use the pointer api. To get a feel for what the scope of what is involved, a grep of the VTK sources shows that there are 451 files with at least one match to the following regexp: SGet.*Pointer. Of those 451 the most important are going to be those in the directories: Common (62 matches), Filtering (36 matches), Graphics (63 matches), Rendering (48 matches), and IO (62 matches). We would likely not attempt to modify all of the IO classes, however the majority of the classes in the Common, Filtering, Graphics and Rendering would likely need modification. This is fairly open ended, we may be able to initially get up nad running by modifying 10-20 classes. Python and other language wrapping will likely be disabled as those wrappers make extensive use of direct pointer manipulations to avoid deep copys of the underlying data. The behavior of VTK readers would need to be altered so that data was not read into memory during the pipeline update. Readers would instead construct OOC objects that manage the paging of the data stored on disk and insert them into the appropriate VTK data, geometry and topology arrays. Various meta data will also need to be provided by reader's such as extents, bounds and ranges. We will initially support only one or two key readers. There is a performance concern with this approach, namely that inserting logic into Set/Get value API has the potential to significantly degrade overall performance of VTK. VTK data containers are performance critical sections of code. Each element of data stored in the data array, points, and cell array types mentioned above will potentially be accessed multiple times by each filter of a given visualization pipeline. With array lengths are typically on the order of 10's of thousands to 100's of thousands of elements. Small changes have a big impact here. One property of the current implementation is that the Set/Get Value API is equivalent to pointer access if/when the calls are in-lined. This can often provide performance comparable to that of direct pointer manipulation. Assuming that the entire VTK library was ported successfully, the modifications described above are not likely to be acceptable for submission into VTK trunk, due to performance concerns. However we need not fork VTK to make this work. Instead we can make use of VTK's object factory mechanism. This will allow us to seamlessly swap in or out our OOC modified classes at run time. We will need an ongoing effort, perhaps once a year, to insure compatibility with the latest stable release of VTK. We should expect 2-4 months of labor to port the Python classes developed for Phase to C++. Each C++ class introduced should have a minimum of one dashboard test, and we should expect 1 month of labor to develop these tests. Once the python classes have been converted into C++ , we should expect 6-10 months of labor to integrate them into VTK as described above. We should expect 1-2 months of labor fixing bugs and testing using VTK's existing dashboard tests. An additional 1-3 months should be allotted to develop a bank of dashboard tests that work the new functionality. G.U.I. To do justice when developing a U.I. for the SciVis project, we would likely have to implement the following set of features: # Specialized file dialog for dealing with many files. # VTK pipeline browser and editor, including support for pipeline branching and merging. # The ability to save and restore visualization pipelines. # Vis job queue, browser, and editor for manipulating scheduled vis jobs (this goes to the threaded nature of the design). # Client to server communication protocol. Including the serialization, transport, and deserialization of the Vis queue, leading to remote construction arbitrary VTK pipelines. # Cell, Point, Block, etc picking and selection, including the ability to subset (this maybe a stretch goal). # Each VTK reader,filter or renderer that we expose will have to have a control panel, some of these will be quite involved including support for VTK widgets(manipulators). # U.I. Controls for working with multi-time step datasets, including animation generation, and caching. # Ability to annotation to text to rendered images. # 2D, 3D, and Spread Sheet views. # Dialog for exporting generated images. # Ability to save and restore application settings. # The G.U.I. application should have a plugin architecture similar to LLNL VisIt to allow for modular extensibility. # Marshalling of binary data between server and client for rendering and local interactive manipulation, or transmission of remotely rendered images. This is really quite a bit of work, and we might expect to accomplish most of it in two years time with full time Qt developer. Summary =Approach using Domain Decomposition= VTK Streaming Support ParaView Streaming Support VTK Decoupled Solution A solution decoupled from VTK would reduce the amount of labor, eliminate performance degradation concerns mentioned above, and potentially free us from limitations of VTK's pipeline/IO implementation which where developed without concern for scalability amd currently maintained for backwards compatibility. This solution would not intercept memory management calls or memory accesses deep within VTK's internals. Instead a specialized decoupled IO layer would decompose the dataset and provide sub domains on demand. An advanced meta data structure would be used to expose the file's contents and the domain decomposition to both U.I. and VTK pipelines created in worker threads. The Visualization server component would intermediate between the IO components, U.I., and worker threads. VTK modifications would be minimal, the architecture would have the potential to scale better than VTK/PV. This soultion could potentially leverage VTK's existing Streaming Demand Driven Pipeline implementation, although it's not clear if that is a wise choice without further investigation. TODO investigate. # vtkRawStridedReader # vtkStreamingUpdateSuppressor # vtkVisibilityPrioritizer # vtkPieceCacheFilter # vtkStreamingOptions # vtkSMSOptionProxy -- Factory object is established here. VTK Coupled Solution TODO investigate VTK Streaming Demand Driven pipeline and the current ParaView streaming U.I. customizations, and streaming plugins. This may well be a vaiable approach, my primary concern is the use of EXTENTS, a VTK concept that is applicable only to structured data, and doesn't work with unstructured or AMR data. =Threaded U.I.= TODO Investigate the possibility of threading PV's U.I. Conclusion ---- TODO